file003MA14194447-0040.jpg
IT'S ONLY HARD IF U THINK SO
----

Type in the content of your page here.
Subject Author Replies Views Last Message
No Comments






First Things First: Demystifying Data Analysis

To improve student achievement results, use data to focus on a few simple, specific goals.
Mike Schmoker
I recently sat with a district administrator eager to understand her district's achievement results. Pages of data and statistical breakdowns covered the table. Looking somewhat helpless, she threw up her hands and asked me, "What do I do with all this?"
Many educators could empathize with this administrator. The experts' tendency to complicate the use and analysis of student achievement data often ensures that few educators avail themselves of data's simple, transparent power. The effective use of data depends on simplicity and economy.
First things first: Which data, well analyzed, can help us improve teaching and learning? We should always start by considering the needs of teachers, whose use of data has the most direct impact on student performance. Data can give them the answer to two important questions:

· How many students are succeeding in the subjects I teach?
  • Within those subjects, what are the areas of strength or weakness?
The answers to these two questions set the stage for targeted, collaborative efforts that can pay immediate dividends in achievement gains.

Focusing Efforts

Answering the first question enables grade-level or subject-area teams of practitioners to establish high-leverage annual improvement goals—for example, moving the percentage of students passing a math or writing assessment from a baseline of 67 percent in 2003 to 72 percent in 2004. Abundant research and school evidence suggest that setting such goals may be the most significant act in the entire school improvement process, greatly increasing the odds of success (Little, 1987; McGonagill, 1992; Rosenholtz, 1991; Schmoker, 1999, 2001).
If we take pains to keep the goals simple and to avoid setting too many of them, they focus the attention and energies of everyone involved (Chang, Labovitz, & Rosansky, 1992; Drucker, 1992; Joyce, Wolf, & Calhoun, 1993). Such goals are quite different from the multiple, vague, ambiguous goal statements that populate many school improvement plans.

Turning Weakness into Strength

After the teacher team has set a goal, it can turn to the next important question: Within the identified subject or course, where do we need to direct our collective attention and expertise? In other words, where do the greatest number of students struggle or fail within the larger domains? For example, in English and language arts, students may have scored low in writing essays or in comprehending the main ideas in paragraphs. In mathematics, they may be weak in measurement or in number sense.
Every state or standardized assessment provides data on areas of strength and weakness, at least in certain core subjects. Data from district or school assessments, even gradebooks, can meaningfully supplement the large-scale assessments. After team members identify strengths and weaknesses, they can begin the real work of instructional improvement: the collaborative effort to share, produce, test, and refine lessons and strategies targeted to areas of low performance, where more effective instruction can make the greatest difference for students.

So What's the Problem?

Despite the importance of the two questions previously cited, practitioners can rarely answer them. For years, during which dataand goals have been education by-words, I have asked hundreds of teachers whether they know their goals for that academic year and which of the subjects they teach have the lowest scores. The vast majority of teachers don't know. Even fewer can answer the question: What are the low-scoring areas within a subject or course you teach?
Nor could I. As a middle and high school English teacher, I hadn't the foggiest notion about these data—from state assessments or from my own records. This is the equivalent of a mechanic not knowing which part of the car needs repair.
Why don't most schools provide teachers with data reports that address these two central questions? Perhaps the straightforward improvement scheme described here seems too simple to us, addicted as we are to elaborate, complex programs and plans (Schmoker, 2002; Stigler & Hiebert, 1999).

Over-Analysis and Overload

The most important school improvement processes do not require sophisticated data analysis or special expertise. Teachers themselves can easily learn to conduct the analyses that will have the most significant impact on teaching and achievement.
The extended, district-level analyses and correlational studies some districts conduct can be fascinating stuff; they can even reveal opportunities for improvement. But they can also divert us from the primary purpose of analyzing data: improving instruction to achieve greater student success. Over-analysis can contribute to overload—the propensity to create long, detailed, "comprehensive" improvement plans and documents that few read or remember. Because we gather so much data and because they reveal so many opportunities for improvement, we set too many goals and launch too many initiatives, overtaxing our teachers and our systems (Fullan, 1996; Fullan & Stiegelbauer, 1991).

Formative Assessment Data and Short-Term Results

A simple template for a focused improvement plan with annual goals for improving students' state assessment scores would go a long way toward solving the overload problem (Schmoker, 2001), and would enable teams of professional educators to establish their own improvement priorities, simply and quickly, for the students they teach and for those in similar grades, courses, or subject areas.
Using the goals that they have established, teachers can meet regularly to improve their lessons and assess their progress using another important source: formative assessment data. Gathered every few weeks or at each grading period, formative data enable the team to gauge levels of success and to adjust their instructional efforts accordingly. Formative, collectively administered assessments allow teams to capture and celebrate short-term results, which are essential to success in any sphere (Collins, 2001; Kouzes & Posner, 1995; Schaffer, 1988). Even conventional classroom assessment data work for us here, but with a twist. We don't just record these data to assign grades each period; we now look at how many students succeeded on that quiz, that interpretive paragraph, or that applied math assessment, and we ask ourselves why. Teacher teams can now "assess to learn"—to improve their instruction (Stiggins, 2002).
A legion of researchers from education and industry have demonstrated that instructional improvement depends on just such simple, data-driven formats—teams identifying and addressing areas of difficulty and then developing, critiquing, testing, and upgrading efforts in light of ongoing results (Collins, 2001; Darling-Hammond, 1997; DuFour, 2002; Fullan, 2000; Reeves, 2000; Schaffer, 1988; Senge, 1990; Wiggins, 1994). It all starts with the simplest kind of data analysis—with the foundation we have when all teachers know their goals and the specific areas where students most need help.

What About Other Data?

In right measure, other useful data can aid improvement. For instance, data on achievement differences among socio-economic groups, on students reading below grade level, and on teacher, student, and parent perceptions can all guide improvement.
But data analysis shouldn't result in overload and fragmentation; it shouldn't prevent teams of teachers from setting and knowing their own goals and from staying focused on key areas for improvement. Instead of overloading teachers, let's give them the data they need to conduct powerful, focused analyses and to generate a sustained stream of results for students.

References

Chang, Y. S., Labovitz, G., & Rosansky, V. (1992). Making quality work: A leadership guide for the results-driven manager. Essex Junction, VT: Omneo.
Collins, J. (2001, October). Good to great. Fast Company, 51, 90–104.
Darling-Hammond, L. (1997). The right to learn: A blueprint for creating schools that work. New York: Jossey-Bass.
Drucker, P. (1992). Managing for the future: The 1990s and beyond. New York: Truman Talley Books.
DuFour, R. (2002). The learning-centered principal. Educational Leadership, 59(8), 12–15.
Fullan, M. (1996). Turning systemic thinking on its head. Phi Delta Kappan, 77(6), 420–423.
Fullan, M. (2000). The three stories of education reform. Phi Delta Kappan, 81(8), 581–584.
Fullan, M., & Stiegelbauer, S. (1991). The new meaning of educational change. New York: Teachers College Press.
Joyce, B., Wolf, J., & Calhoun, E. (1993). The self-renewing school. Alexandria, VA: ASCD.
Kouzes, J., & Posner, B. (1995). The leadership challenge. San Francisco: Jossey-Bass.
Little, J. W. (1987). Teachers as colleagues. In V. Richardson-Koehler (Ed.), Educator's handbook. White Plains, NY: Longman.
McGonagill, G. (1992). Overcoming barriers to educational restructuring: A call for "system literacy." ERIC, ED 357–512.
Reeves, D. (2000). Accountability in action. Denver, CO: Advanced Learning Press.
Rosenholtz, S. J. (1991). Teacher's workplace: The social organization of schools. New York: Teachers College Press.
Schaffer, R. H. (1988). The breakthrough strategy: Using short-term successes to build the high-performing organization. New York: Harper Business.
Schmoker, M. (1999). Results: The key to continuous school improvement (2nd ed). Alexandria, VA: ASCD.
Schmoker, M. (2001). The results fieldbook: Practical strategies from dramatically improved schools. Alexandria, VA: ASCD.
Schmoker, M. (2002). Up and away. Journal of Staff Development, 23(2), 10–13.
Senge, P. (1990). The fifth discipline: The art and practice of the learning organization. New York: Doubleday.
Stiggins, R. (2002). Assessment crisis: The absence of assessment FOR learning. Phi Delta Kappan, 83(10), 758–765.
Stigler, J. W., & Hiebert, J. (1999). The teaching gap: Best ideas from the world's teachers for improving education in the classroom. New York: Free Press.
Wiggins, G. (1994). None of the above. The Executive Educator, 16(7), 14–18.



A Reader's Guide to Scientifically Based Research Robert E. Slavin

Learning how to assess the validity of education research is vital for creating effective, sustained reform.

In every successful, dynamic part of our economy, evidence is the force that drives change. In medicine, researchers continually develop medications and procedures, compare them with current drugs and practices, and if they produce greater benefits, disseminate them widely. In agriculture, researchers develop and test better seeds, equipment, and farming methods. In technology, in engineering, in field after field, progress comes from research and development. Physicians, farmers, consumers, and government officials base key decisions on the results of rigorous research.

In education reform, on the other hand, research has played a relatively minor role. Untested innovations appear, are widely embraced, and then disappear as their unrealistic claims fail to materialize. We then replace them with equally untested innovations diametrically opposed in philosophy, in endless swings of the reform pendulum. Far more testing goes into our students' hair gel and acne cream than into most of the curricula or instructional methods teachers use. Yet which of these is more important to our students' future?

Evidence-Based Reform
At long last, education reform may be entering an era of well-researched programs and practices (Slavin, 2002). The U.S. government is now interested in the research base for programs that schools adopt. The Comprehensive School Reform Demonstration legislation of 1997 gives grants to schools to adopt “proven, comprehensive” reform designs. Ideally, “proven” means that programs have been evaluated in “scientifically based research,” which is defined as “rigorous, systematic, and objective procedures to obtain valid knowledge” (U.S. Department of Education, 1998). The emphasis is on evaluations that use experimental or quasi-experimental designs, preferably with random assignment. The Bush administration's No Child Left Behind Act mentions “scientifically based research” 110 times in references to Reading First programs for grades K-3, Early Reading First for preK, Title I school improvement programs, and many more. In each case, schools, districts, and states must justify the programs that they expect to implement under federal funding.

Judging the Validity of Education Research
The new policies that base education funding and practice on scientifically based, rigorous research have important consequences for educators. Research matters. Educators have long given lip service to research as a guide to practice. But increasingly, they are being asked to justify their choices of programs and practices using the findings of rigorous, experimental research.

Why is one study valid whereas another is not? There are many valid forms of research conducted for many reasons, but for evaluating the achievement outcomes of education programs, judging research quality is relatively straightforward. Valid research for this purpose uses meaningful measures of achievement to compare several schools that used a given program with several carefully matched control schools that did not. It's that simple.

Control Groups
A hallmark of valid, scientifically based research on education programs is the use of control groups. In a good study, researchers compare several schools using a given program with several schools not using the program but sharing similar demographics and prior performance, preferably in the same school district. Having at least five schools in each group is desirable; circumstances unique to a given school can bias studies with just one or two schools in each group.

A control group provides an estimate of what students in the experimental program would have achieved if they had been left alone. That's why the control schools must be as similar as possible to the program schools at the outset.

Randomized and Matched Experiments
The most convincing form of a control group comparison is a randomized experiment in which students, teachers, or schools are assigned by chance to a group. For example, the principals and staffs at ten schools might express interest in using a given program. The schools might be paired up and then assigned by a coin flip to the experimental or control group.

Randomized experiments are very rare in education, but they can be very influential. Perhaps the best known example in recent years is the Tennessee class size study (Achilles, Finn, & Bain, 1997/1998) in which researchers assigned students at random to small classes (15 students), regular classes (20–25 students), or regular classes with an aide. The famous Perry Preschool Program (Berrueta-Clement, Schweinhart, Barnett, Epstein, & Weikart, 1984) assigned four-year-olds at random to attend an enriched preschool program or to stay at home. Two recent studies of James Comer's School Development Project randomly assigned schools to use the School Development Project or keep using their current program (Cook et al., 1999; Cook, Murphy, & Hunt, 2000). In each of these studies, random assignment made it very likely that the experimental and control groups were identical at the outset, so any differences at the end were sure to have resulted from the program.

Matched studies are far more common than randomized ones. In a matched program evaluation, researchers compare students in a given program with those in a control group that is similar in prior achievement, poverty level, demographics, and so on. Matched studies can be valid if the experimental and control groups are very similar. Often, researchers use statistical methods to “control for” pretest differences between experimental and control groups. This can work if the differences are small, but if there are large differences at pretest, statistical controls or use of test-gain scores (calculated by subtracting pretest scores from posttest scores) are generally not adequate.

The potential problem with even the best matched studies is the possibility that the schools that chose a given program have (unmeasured) characteristics that are different from those that did not choose it. For example, imagine that a researcher asked 10 schools to implement a new program. Five enthusiastically take it on and five refuse. Using the refusal group as a control group, even if it is similar in other ways, can introduce something called selection bias. In this example, selection bias would work in favor of finding a positive treatment effect because the volunteer schools are more likely to have enthusiastic, energetic teachers willing to try new methods than are the control schools. In other cases, however, the most desperate or dysfunctional schools may have chosen or been assigned to a given program, giving an advantage to the control schools.

Is Random Assignment Essential?
Random assignment to experimental and control groups is the gold standard of research. It virtually eliminates selection bias because students, classes, or schools were assigned to treatments not by their own choice but by the flip of a coin or another random process.

Because randomized studies can rule out selection bias, the U.S. Department of Education and many researchers and policymakers have recently been arguing for a substantial increase in the use of randomized designs in evaluations of education programs. Already, more randomized studies are under way in education than at any other point in history.

The only problem with random assignment is that it is very difficult and expensive to do, especially for schoolwide programs that necessitate random assignment of whole schools. No one likes to be assigned at random, so such studies often have to provide substantial incentives to get educators to participate. Still, such studies are possible; we have such a study under way to evaluate our Success for All comprehensive reform model, and, as noted earlier, Comer's School Development Program has been evaluated in two randomized studies.

At present, with the movement toward greater use of randomized experiments in education in its infancy, educators evaluating the research base for various programs must look carefully at well-matched experiments, valuing those that try to minimize bias by using closely matched experimental and control groups, having adequate numbers of schools, avoiding comparing volunteers with nonvolunteers, and so on.

Statistical and Educational Significance and Sample Size
Reports of education experiments always indicate whether a statistically significant difference exists between the achievement of students in the experimental group and those in the control group, usually controlling for pretests and other factors. A usual criterion is “p < 0.05,” which means that the probability is less than 5 percent that an observed difference might have happened by chance.

The proportion of students within a program getting “significantly higher” scores than those in a control group is important, but it may not be important enough. In a large study, a small difference could be significant. A typical measure of the size of a program effect is “effect size,” the experimental-control difference divided by the control group's standard deviation (a measure of the dispersion of scores). In education experiments, an effect size of +0.20 (20 percent of a standard deviation) is often considered a minimum for significance; effect sizes above +0.50 would be considered very strong.

But student groupings can have a profound impact on student outcomes. Often, an experiment will compare one school using Program X with one matched control school. If 500 students are in each school, this is a very large experiment. Yet the difference between the Program X school and the control school could be due to any number of factors that have nothing to do with Program X. Perhaps the Program X school has a better principal or a cohesive group of teachers or has been redistricted to include a higher-performing group of students. Perhaps one of the schools experienced a disaster of some sort—in an early study of our Success for All program, Hurricane Hugo blew the roof off of the Success for All school but did not affect the one control school.

Because of the possibility that something unusual that applies to an entire school could affect scores for all the students in that school, statisticians insist on using the school's means, not individual student scores, in their analyses. In this way, individual school factors are likely to balance out. Statistical requirements would force a researcher to have at least 20–25 schools in each condition. Very few education experiments are this large, however, so the vast majority of experiments analyze at the student level.

Readers of research must apply a reasonable approach to this problem. We should view studies that observe a single school or class for each condition with great caution. However, a study with as many as five program schools and five control schools probably has enough schools to ensure that a single unusual school will not skew the results. Such a study would still use individual scores, not school means, but it would be far preferable to a comparison between only two schools.

A single study involving a small number of schools or classes may not be conclusive in itself, but many such studies, preferably done by many researchers in a variety of locations, can add confidence that a program's effects are valid. In fact, experimental research in education usually develops in this way. Rather than evaluate one large, definitive study, researchers must usually look at many small studies that may be flawed in various (unbiased) ways. But if these studies tend to find consistent effects, the entire set of studies may produce a meaningful conclusion.


Research to Avoid
All too often, program developers or advocates cite evidence that is of little value or that is downright misleading. A rogue's gallery of such research follows.

Cherry Picking
Frequently, program developers or marketers report on a single school or a small set of schools that made remarkable gains in a given year. Open any education magazine and you'll see an ad like this: “Twelfth Street Elementary went from the 20th percentile to the 60th in only one year!” Such claims have no more validity than advertisements for weight loss programs that tell the story of one person who lost 200 pounds (forgetting to mention the hundreds who did not lose weight on the diet). This kind of “cherry picking” is easy to do in a program that serves many schools; there are always individual schools that make large gains in a given year, and the marketer can pick them after the fact just by looking down a column of numbers to find a big gainer. (Critics of the program can use the same technique to find a big loser.) Such reports are pure puffery, not to be confused with science.

Bottom Fishing
A variant of cherry picking is “bottom fishing,” using an after-the-fact comparison in which an evaluator compares schools using a given program with matched “similar schools” known to have made poor gains in a given year. Researchers can legitimately compare gains made in program schools and gains made in the entire district or state because the large comparison group makes “bottom fishing” impossible. However, readers should interpret with caution after-the-fact studies purporting to compare groups selected by the evaluator.

Pre–Post Studies
Another common but misleading design is the pre–post comparison, lacking a control group. Typically, the designer cites standardized test data, with the rationale that the expected year-to-year gain in percentiles, normal curve equivalents, or percent passing is zero, so any school that gained more than zero has made good progress.

The problem with this logic is that many states and districts make substantial gains in a given year, so the program schools may be doing no better than other schools. In particular, states usually make rapid gains in the years after they adopt a new test. At a minimum, studies should compare gains made in program schools in a given district or state with the gains made in the entire district or state.

Scientifically Based Versus Rigorously Evaluated
A key issue in the recent No Child Left Behind legislation is the distinction between programs that are “based on scientifically based research” and those that have been evaluated in valid scientific experiments. A program can be “based on scientifically based research” if it incorporates the findings of rigorous experimental research. For example, reading programs are eligible for funding under the federal Reading First initiative if states determine that they incorporate a focus on five elements of effective reading instruction: phonemic awareness, phonics, fluency, vocabulary, and comprehension. The National Reading Panel (1999) identified these elements as having been established in rigorous research, especially in randomized experiments. Yet there is a big difference between a program based on such elements and a program that has itself been compared with matched or randomly assigned control groups. We can easily imagine a reading program that would incorporate the five elements but whose training was so minimal that teachers did not implement these elements well, or whose materials were so boring that students were not motivated to study them.

The No Child Left Behind guidance (U.S. Department of Education, 2002) recognizes this distinction and notes a preference for programs that have been rigorously evaluated, but also recognizes that requiring such evaluations would screen out many new reading programs that have not been out long enough to have been evaluated, and so allows for their use. This approach may make sense from a pragmatic or political perspective, but from a research perspective, a program that is unevaluated is unevaluated, whether or not it is “based on” scientifically based research. A basis in scientifically based research makes a program promising, but not proven.

Research Reviews
In order to judge the research base for a given program, it is not necessary that every teacher, principal, or superintendent carry out his or her own review of the literature. Several reviews applying standards have summarized evidence on various programs.

For comprehensive school reform models, for example, the American Institutes for Research published a review of 24 programs (Herman, 1999). The Thomas Fordham Foundation (Traub, 1999) commissioned an evaluation of 10 popular comprehensive school reform models. And Borman, Hewes, Rachuba, and Brown (2002) carried out a meta-analysis (or quantitative synthesis) of research on 29 comprehensive school reform models.

Research reviews facilitate the process of evaluating the evidence behind a broad range of programs, but it's still a good idea to look for a few published studies on a program to get a sense of the nature and quality of the evidence supporting a given model. Also, we should look at multiple reviews because researchers differ in their review criteria, conclusions, and recommendations. Adopting a program for a single subject, much less for an entire school, requires a great deal of time, money, and work—and can have a profound impact on a school for a long time. Taking time to look at the research evidence with some care before making such an important decision is well worth the effort. Accepting the developer's word for a program's research base is not a responsible strategy.

How Evidence-Based Reform Will Transform Our Schools
The movement to ask schools to adopt programs that have been rigorously researched could have a profound impact on the practice of education and on the outcomes of education for students. If this movement prevails, educators will increasingly be able to choose from among a variety of models known to be effective if well implemented, rather than reinventing (or misinventing) the wheel in every school. There will never be a guarantee that a given program will work in a given school, just as no physician can guarantee that a given treatment will work in every case. A focus on rigorously evaluated programs, however, can at least give school staffs confidence that their efforts to implement a new program will pay off in higher student achievement.

In an environment of evidence-based reform, developers and researchers will continually work to create new models and improve existing ones. Today's substantial improvement will soon be replaced by something even more effective. Rigorous evaluations will be common, both to replicate evaluations of various models and to discover the conditions necessary to make programs work. Reform organizations will build capacity to serve thousands of schools. Education leaders will become increasingly sophisticated in judging the adequacy of research, and, as a result, the quality and usefulness of research will grow. In programs such as Title I, government support will focus on helping schools adopt proven programs, and schools making little progress toward state goals may be required to choose from among a set of proven programs.

Evidence-based reform could finally bring education to the point reached early in the 20th century by medicine, agriculture, and technology, fields in which evidence is the lifeblood of progress. No Child Left Behind, Reading First, Comprehensive School Reform, and related initiatives have created the possibility that evidence-based reform can be sustained and can become fundamental to the practice of education. Informed education leaders can contribute to this effort. It is ironic that the field of education has embraced ideology rather than knowledge in its own reform process. Evidence-based reform honors the best traditions of our profession and promises to transform schooling for all students.

References
Achilles, C. M., Finn, J. D., & Bain, H. P. (1997/1998). Using class size to reduce the equity gap. Educational Leadership, 55(4), 40–43.

Berrueta-Clement, J. R., Schweinhart, L. J., Barnett, W. S., Epstein, A. S., & Weikart, D. P. (1984). Changed lives. Ypsilanti, MI: High/Scope.

Borman, G. D., Hewes, G. M., Rachuba, L. T., & Brown, S. (2002). Comprehensive school reform and student achievement: A meta-analysis. Submitted for publication. (Available from the author atgborman@education.wisc.edu)

Cook, T. D., Habib, F., Phillips, M., Settersten, R. A., Shagle, S., & Degirmencioglu, M. (1999). Comer's school development program in Prince George's County, Maryland: A theory-based evaluation. American Educational Research Journal, 36(3), 543–597.

Cook, T., Murphy, R. F., & Hunt, H. D. (2000). Comer's school development program in Chicago: A theory-based evaluation. American Educational Research Journal, 37(2), 543–597.

Herman, R. (1999). An educators' guide to schoolwide reform. Arlington, VA: Educational Research Service.

National Reading Panel. (1999). Teaching children to read. Washington, DC: U.S. Department of Education.

Slavin, R. E. (2002). Evidence-based education policies: Transforming educational practice and research. Educational Researcher, 31(7), 15–21.

Traub, J. (1999). Better by design? A consumer's guide to schoolwide reform. Washington, DC: Thomas Fordham Foundation.

U.S. Department of Education. (1998). Guidance on the comprehensive school reform demonstration program. Washington, DC: Author.

U.S. Department of Education. (2002). Draft guidance on the comprehensive school reform program (June 14, 2002 update). Washington, DC: Author.

Author's Note: This article was written under funding from the Office of Educational Research and Improvement, U.S. Department of Education. Any opinions expressed are those of the author and do not necessarily represent OERI positions or policies.







Realizing the Promise of Standards-Based Education

To avoid curricular chaos, educators must be judicious about the standards they assess.
Mike Schmoker and Robert J. Marzano
The standards movement is arguably a major force in education today, and some researchers assert that the significance of the standards campaign will be huge. Undoubtedly, historians will identify the last decade of this century as the time when a concentrated press for national education standards emerged (Glaser & Linn, 1993, p. xiii).
But will the standards movement endure? And if it does, will it contribute significantly to higher achievement? We believe it will—but only if we rein in its most excessive tendencies. Those tendencies can be seen in the nature and length of state and professional standards documents—and in their unintended consequences.

The Promise of the Standards Movement

Make no mistake: The success of any organization is contingent upon clear, commonly defined goals. A well-articulated focus unleashes individual and collective energy. And a common focus clarifies understanding, accelerates communication, and promotes persistence and collective purpose (Rosenholtz, 1991). This is the stuff of improvement.
The promise of standards can be seen in places like

· Frederick County, Maryland, where the number of students reaching well-defined and commonly assessed standards rose dramatically, lifting them from the middle to the highest tier in Maryland schools. Local assessments were deliberately aligned with standards as they were embedded in the state assessments.
· Fort Logan Elementary School in Denver, Colorado, where scores rose significantly when teams of teachers analyzed weaknesses in performance relative to grade-level standards. Each team reviewed test data and developed strategies for helping students learn in identified areas of difficulty.
· Lake Havasu City, Arizona, where teams of Title I teachers identified, defined, and focused instruction on common reading skills. Once teachers had a shared language about which skills to concentrate on, they improved strategies and systems to improve instructional quality and consistency. As a result, the number of students reading at or above grade level rose from 20 to 35 percent in just one year.
· Glendale Union High School District near Phoenix, Arizona, where teams of teachers have increased student performance for almost every course offered. All district teachers—whether they teach algebra, U.S. history, biology, or senior English—are teaching to the same year-end assessments developed by subject-area teams. The same coordination is happening at Adlai Stevenson High School in Lincolnshire, Illinois, where teacher teams continue to set measurable achievement records on every kind of assessment.
· Amphitheater High School in Tucson, Arizona, where teacher Bill Bendt routinely helps exceptional numbers of students pass advanced placement tests by carefully focusing instruction on the standards made explicit by the AP exam.
How did they get these results? Interestingly, not by focusing on standards contained in state or professional documents. Their efforts preceded those documents. Nonetheless, in each case, teachers knew exactly what students needed to learn, what to teach to, where to improve, and what to work on with colleagues. Clear, common learning standards—manageable in number—promote better results. They are essential to focus and to coherence.
If this is true, then educators face two important questions: (1) Do we already have sufficiently clear standards? and (2) Are state and professional standards documents truly helping us achieve the focus and the coherence that are vital to success? In too many cases, the answer to both questions is no.

Don't We Already Have Standards?

Curiously, standards in most districts are often similar. We have curriculums, scope, and sequence for each grade level, course, and subject area. But the perception of a common, coherent program of teaching and learning is a delusion. One of us once sat with a curriculum coordinator, poring through a dense curriculum notebook of the district's grade-by-grade "learner outcomes." The document was years in the making. Nonetheless, when the coordinator was asked what influence the curriculum was having on instruction, she was candid enough to reply "probably none." Consultant and author Heidi Hayes Jacobs likes to say that curriculum guides are "well-intended fictions." Her conclusion is that the current system actually encourages teachers to simply teach what they like to teach.
It is time to admit that at the ground level, where teachers teach and students learn, there is not coherence, but chaos. The chief problem is that there is simply too much to teach—arguably two to three times too much (Schmidt, McKnight, & Raizen, 1996)—and too many options for what can be taught (Rosenholtz, 1991). There are enormous differences in what teachers teach in the same subject at the same grade level in the same school. Even when common, highly structured textbooks are used as the basis for a curriculum, teachers make independent and idiosyncratic decisions regarding what should be emphasized, what should be added, and what should be deleted (see, for example, Doyle, 1992). Such practices create huge holes in the continuum of content to which students are exposed. In The Learning Gap, researchers Stevenson and Stigler (1992, p. 140) observe that teachers are "daunted by the length of most textbooks." In a system that does little or nothing to help them coordinate priorities, they are forced to select or to omit different topics haphazardly. This only adds to the prevailing chaos.

Standards and School Improvement

The implications of this chaos go to the heart of school improvement. Researcher Susan Rosenholtz found that
The hallmark of any successful organization is a shared sense among its members about what they are trying to accomplish. Agreed-upon goals and ways to attain them enhance the organization's capacity for rational planning and action. (1991, p. 13; our emphasis)
For this reason, she was dismayed to find that schools were unique among organizations in lacking common goals and that the goals of teaching were "multiple, shifting and frequently disputed" (p. 13).
This state of chaos was the rationale for the standards movement—and the most visible and influential manifestations are the state and professional standards documents. Yet these documents themselves have contributed to the very problems they were intended to solve.

The Perils of Standards-Based Education

"Less is more" we keep telling ourselves. Students learn more when we teach less—but teach it well (Dempster, 1993). Nowhere is this principle more obviously violated than in the standards documents. The official documents generated by 49 states and the professional subject-area organizations have had unintended consequences. Commentator Ronald Wolk has found some of them not only to be written in language that is "absurd" but also to contain such quantity that it would take a 10-hour teaching day to cover the material in them (1998).
Because it is easier to add and enlarge than to reduce and refine, we are caught in the snare of having honored (perhaps for political reasons) far too many suggestions for inclusion in the standards documents. We have often failed to place hard but practical limits on the number and the nature of the standards. The result? Bloated and poorly written standards that almost no one can realistically teach to or ever hope to adequately assess. We are making the same mistakes with these documents that we made with our district curriculums.
In the case of standards, quantity is not quality. The irony of the Third International Mathematics and Science Study (TIMSS) shouts at us: Although U.S. mathematics textbooks attempt to address 175 percent more topics than do German textbooks and 350 percent more topics than do Japanese textbooks, both German and Japanese students significantly outperform U.S. students in mathematics. Similarly, although U.S. science textbooks attempt to cover 930 percent more topics than do German textbooks and 433 percent more topics than do Japanese textbooks, both German and Japanese students significantly outperform U.S. students in science achievement as well (Schmidt, McKnight, & Raizen, 1996).
Clearly, U.S. schools would benefit from decreasing the amount of content they try to cover. And teacher morale and self-efficacy improve when we confidently lay out a more manageable number of essential topics to be taught and assessed in greater depth.

Getting Standards Right

Too many of the state standards documents, informed as they are by the professional subject-area standards, have frustrated rather than helped our attempt to provide common focus and clarity for teachers and students. The good news is this: Clear, intelligible standards are a pillar of higher achievement. Aligned with appropriate assessments, they can help us realize the dream of learning for all. They are the heart of the infrastructure for school improvement (Rosenholtz, 1991; Fullan & Stiegelbauer, 1991).

The Standards-Driven School

Consider a school where teachers know exactly what essential skills and knowledge students should learn that year and where they know that their colleagues are teaching to the same manageable standards. Because of this, their fellow teachers can collaborate with them on lessons and units.
This in turn leads to a living bank of proven, standards-referenced instructional materials—lessons, units, and assessments perfected through action research. Both new and veteran teachers can peruse these targeted materials, learning from and adding to the richness of the faculty's repertoire. Because of these rich resources, new and struggling teachers achieve confidence and competence much more rapidly, and experienced teachers have a sense of making a meaningful, ongoing contribution to their craft while being renewed by instructional ideas that are engaging for students. Proven methods, practices, and lessons aligned with established standards become the center of the professional dialogue. Results on local, state, and formative assessments get better and better. Such an alignment leads inevitably to better short- and long-term results on local and state assessments as well as on norm-referenced, alternative, and criterion-referenced assessments.
To create this infrastructure in schools, we can take a few concrete steps:

· Start with the standards that are assessed. Be circumspect about standards that are not assessed. After thoroughly reviewing the state standards documents, we believe that many of them never will be thoroughly assessed. Many of the existing standards that educators are working manically to "cover" will disappear because of their own irrelevancy and imprecision. Expending organized effort on every standard is senseless because many of them will turn out to be ephemeral. Start by focusing teaching on the standards actually contained in current state norm-referenced or criterion-referenced assessments.
As state assessments develop, real priorities become clear. And we must learn all we can about how to teach to these priorities most effectively. Teachers in Colorado, now that they know the reading and writing standards through their experience with the state assessments, are responding in a positive and coordinated fashion. Many schools, like Bessemer Elementary in Pueblo, which has an 80 percent minority population, have realized dramatic gains. At Bessemer, from 1997 to 1998, the number of students performing at or above the standard in reading rose from 12 percent to 64 percent. In writing, they went from 2 percent to 48 percent. Weekly standards-based team meetings made the difference.
State and standardized assessments do not measure everything we deem important, but success on such tests in this age of accountability is vital. Strong standardized scores earn us the trust of our communities as we begin to demonstrate measurable progress on local criterion-referenced and alternative assessments. In districts where improvement on formal, public assessments is of the essence, we should assemble clear lists of the standards and proficiencies that the assessments will require of students. District offices and regional consortiums must take the initiative here: They must assemble representative teams of teachers to develop—and provide every teacher with—a precise, manageable list of the essential, assessed standards.
Every school year, the full faculty should conduct a review of assessment results. Teams of teachers should identify the most pronounced patterns of student weakness, then seek absolute clarity on the nature of these problems. Through staff development and regular, professional collaboration, teachers should focus on these areas, while monitoring progress regularly.
· Beyond state assessments, add judiciously to the list of standards you will teach and assess. For Michael Fullan, "assessment is the coherence-maker" in school improvement (1998, personal communication). Because of the limitations of state and norm-referenced tests, we must develop local and district standards and assessments that take us beyond them. Districts should review the standards documents, but then exercise severe discipline in prioritizing on the basis of what students will most need if they are to become reflective thinkers, competent workers, and responsible citizens. For every grade or level, pilot your new standards and assessments while asking the question, Are the standards clear, relevant, and not so numerous that they sacrifice depth over breadth? Don't be afraid to do a rough account-ing of time for teaching topics.
Adlai Stevenson High School has achieved world-class results in this way. Glendale Union High School District has done a masterful job of successfully concentrating on norm-referenced tests while implementing a coherent system of formative and end-of-course alternative assessments for high school courses. These assessments require students to do investigative science and to write analyses about social and historical issues—all according to clear standards and criteria. These common, teacher-made assessments embody and clarify precisely those thinking and reasoning standards that norm-referenced tests don't adequately assess. The result is an education that ensures a level of both breadth and substance that goes far beyond what is now required of the average high school graduate.
Perhaps the best time to develop such standards-based assessments is summer. Such work doesn't always require enormous amounts of time or resources. In Lake Havasu City, Arizona, educators developed common K–12 assessments in almost every subject area for about $25,000 over a two-year period. They took only four days to prioritize core science standards and generate common K–12 assessments.
· Do not add more topics than can be taught and assessed reasonably and effectively. A key to developing science assessments in Lake Havasu City was following open discussions with fast, fair rank-ordering procedures that used weighted voting to quickly establish priority standards. Because we can expect educators to differ in philosophy and priority, every school employee could benefit from training in the use of these simple decision-making tools.
The tendency toward overload is strong in schools—and crippling to improvement efforts (Fullan & Hargreaves, 1996). A district we know has received high praise for showcase work by developing grade-by-grade benchmarks for the state standards. For 4th grade, educators developed 210 items to be taught in math, but 125 of these were also to be taught in six to eight other grades. In another district, in another state, there are only 17 items for 4th grade math, and they're written in language that is clear to parents and teachers.
At the local and state levels, we must demand that economy and clarity inform all standards and that they be meaningfully—not just rhetorically—aligned with assessments. Every teacher deserves a clear, manageable, grade-by-grade set of standards and learning benchmarks that make sense and allow a reasonable measure of autonomy. Anything less is frustrating, inhumane, and counterproductive.
Standards—when we get them right—will give us the results we want. But this will require hard-headed, disciplined effort. The lesson of TIMSS should considerably diminish the perceived risk of downsizing the curriculum. The very nature of organizations argues that we succeed when all parties are rowing in the same direction. We will realize the promise of school reform when we establish standards and expectations for reaching them that are clear, not confusing; essential, not exhaustive. The result will be a new coherence and a shared focus that could be the most propitious step we can take toward educating all students well.


McREL Researches Curriculum-Based Reform

Curriculum-based reform, which aligns curriculum with content and performance standards, is sweeping education systems. But what makes curriculum-based reform effective? The Mid-continent Regional Educational Laboratory (McREL) is heading a series of studies to survey the implementation of this reform approach and its impact on student achievement.
McREL researchers have identified four state-level components for successful curriculum-based reform: an ongoing standards review, a professional development plan, an assessment program, and an accountability system. Although 80 percent of states reported that they impose sanctions when school or district assessment results are low, only 55 percent of states reported that assessment is tightly aligned to standards. And more than 45 states require that all students meet standards and participate in standards-based assessment projects.
The reports "Curriculum Reform: What State Officials Say Works" and "Taking Stock of States' Curriculum-Based Reform Efforts" are available from McREL, Curriculum, Learning and Instruction Project, 2550 South Parker Rd., Ste. 500, Aurora, CO 80014-1678 (Web site: www.mcrel.org).




References

Dempster, F. N. (1993). Exposing our students to less should help them learn. Phi Delta Kappan, 74(6), 432–437.
Doyle, W. (1992). Curriculum and pedagogy. In P. W. Jackson (Ed.), Handbook of research in curriculum (pp. 486–516). New York: Macmillan.
Fullan, M., & Hargreaves, A. (1996). What's worth fighting for in your school? New York: Teachers College Press.
Fullan, M., & Stiegelbauer, S. (1991). The new meaning of educational change. New York: Teachers College Press.
Glaser, R., & Linn, R. (1993). Foreword. In L. Shepard (Ed.), Setting performance standards for student achievement (pp. xiii–xiv). Stanford, CA: National Academy of Education, Stanford University.
Rosenholtz, S. J. (1991). Teacher's workplace: The social organization of schools. New York: Teachers College Press.
Schmidt, W. H., McKnight, C. C., & Raizen, S. A. (1996). Splintered vision: An investigation of
U.S. science and mathematics education: Executive summary. Lansing, MI: U.S. National Research Center for the Third International Mathematics and Science Study, Michigan State University.
Stevenson, H. W., & Stigler, J. W. (1992). The learning gap: Why our schools are failing and what we can learn from Japanese and Chinese education. New York: Summit.
Wolk, R. (1998). Doing it right. Teacher Magazine, 10(1), 6.


----

Mike Schmoker (e-mail: info@mikeschmoker.com) is author of Results: The Key to Continuous School Improvement (ASCD, 1996). Robert J. Marzano is coauthor of A Comprehensive Guide to Designing Standards-Based Districts, Schools, and Classrooms (ASCD/McREL, 1997) and Content Knowledge: A Compendium of Standards and Benchmarks for K–12 Education (McREL/ASCD, 1996). Schmoker is Senior Consultant, School Improvement, and Marzano is Senior Fellow for McREL, 2550 S. Parker Rd., Ste. 500, Aurora, CO 80014-1678.


Mike Schmoker is an educational speaker and consultant; info@mikeschmoker.com. His most recent book is The RESULTS Fieldbook: Practical Strategies from Dramatically Improved Schools (ASCD, 2001).